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ABSTRACT 

In the conception and design of intelligent 
systems, one promising direction involves the use of 
fuzzy logic and neural network theory to enhance 
such systems’ capability to leam from experience 
and adapt to changes in an environment of uncer- 
tainty and imprecision. This paper explores an intel- 
ligent control scheme by integrating these multi- 
disciplinary techniques, A self-learning system is 
proposed as an intelligent controller for dynamical 
processes, employing a control policy which evolves 
and improves automatically. One key component of 
the intelligent system is a fuzzy logic -based system 
which emulates human decision-making behavior. 
Another key component is cognitive neural models 
derived from animal learning theory, which stimu- 
late memory association and learning behavior. It is 
shown that the system can solve a fairly difficult 
control learning problem. Simulation results 
demonstrate that improved learning performance can 
be achieved in relation to previously described sys- 
tems employing bang-bang control. The proposed 
system is relatively insensitive to variations in the 
parameters of the system environment 

I. INTRODUCTION 

During the past several years, a highly 
promising direction in the design of intelligent sys- 
tems has emerged. More specifically, the direction 
in question involves the use of fuzzy logic and 
neural network theory to enhance the ability of 
intelligent systems that can leam from experience 
and adapt to changes in an environment of uncer- 
tainty and imprecision. This paper provides a brief 
introduction on a fuzzy logic-based system [16][17] 
and cognitive neural models [18] [19], and explores 
an intelligent control system by integrating these 
multi-disciplinary techniques. The approach 
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described here may be viewed as a step in the 
development of a better understanding of how to 
combine a fuzzy logic-based system with a neural 
network to achieve a significant learning / adaptive 
capability. 

A. Why Fuzzy Logic Control? 

There are many complex industrial processes 
which cannot be satisfactorily controlled by conven- 
tional methods due to modeling difficulties and una- 
vailability of quantitative data regarding input- 
output relations. And yet, skilled human operators 
can control such systems quite successfully without 
having any quantitative models in mind. Further- 
more, the operation of many man-machine systems 
requires the use of rules of thumb, intuition, and 
heuristics. All of these features are uncertain and 
imprecise and cannot be addressed adequately by 
conventional methods. As the increasing complex- 
ity and nonlinearity of control systems render con- 
ventional methods less effective, a rule-based sys- 
tem based on fuzzy logic becomes an increasingly 
attractive alternative. 

In fact, during the past several years, rule- 
based controllers based on fuzzy logic [16][17] have 
emerged as one of the most active and fruitful areas 
for research in the application of fuzzy set theory 
[34]. Among the representative applications of 
fuzzy logic-based controllers are the subway system 
in the city of Sendai [33], container ship crane con- 
trol [32], elevator control [4][30], nuclear reactor 
control [2][11], automobile transmission control 

[23] , air conditioners [22], anti-lock break systems 

[24] and human-quality robot eyes [5]. Experience 
shows that a rule-based controller using fuzzy logic 
make it possible to emulate and even surpass the 
decision-making ability of a skilled human operator. 

Although there is an extensive literature 
describing various fuzzy logic-based controllers 
using approximate reasoning, the acquisition of the 
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rule base in such controllers is not as yet well 
understood. In past applications, fuzzy decision 
rules are either obtained from verbal expressions or 
observations of human operator control actions. 
Since domain experts and skilled operators do not 
structure their decision making in any formal way, 
the process of transferring expert knowledge into a 
usable knowledge base is tedious and unsystematic. 
Our research aims at the development of a better 
understanding of such problems, with a view to 
enhancing the potential of fuzzy logic-based con- 
trollers, which can operate effectively in an environ- 
ment of uncertainty and imprecision. 

One direction that is beginning to be explored 
is that of the conception and design of fuzzy sys- 
tems which have the capability to learn from experi- 
ence. In this context, a combination of techniques 
drawn from both fuzzy logic and neural network 
theory may provide a powerful tool for the design 
of intelligent systems which can emulate the 
decision-making ability of a skilled human operator 
and the ability to learn and adapt to changes in an 
environment of uncertainty and imprecision. 

B. Why Cognitive Neural Models? 

The theory of animal learning is inferred from 
observed behavior and constitutes carefully testified 
postulates regarding elemental processes of learning. 
Recent research into animal learning can be 
separated into two categories: the behavioral and 
neural substrates of learning, namely, the psycholog- 
ical and physiological levels of learning. One way 
to bridge such a gap is to postulate neural analogies 
of behavioral modification paradigms. Hebb’s postu- 
late [9] for synaptic plasticity was the first triad as a 
neural analogy of associative learning, which 
attempted to bridge psychology and neurophysiol- 
ogy. The theory of adaptive networks originated 
with [9] and continues to be influenced by plausible 
neural analogies of behavioral conditioning [6] [12] 
[7] [28] [26] [29] [ 1 3] [27] 

Contemporary artificial neural networks are 
frequently referred to as connectionist models, paral- 
lel distributed processing (PDP) models, and adap- 
tive / self-organizing networks. Basically, it is a 
complex system of neuron-like processing units that 
operate asynchronously but in parallel and whose 
function is determined by the network topology of 
connectivity. Artificial neural networks provide a 
new computational structure, a plausible approach 
for information processing because of its adaptivity / 


learning as well as massive parallelism. 

Although new learning algorithms and VLSI 
technologies have recently provided strong impetus 
to neural network research, many problems still 
exist Among them, the comprehensibility of neural 
networks, theoretical parsimony / enormous cost, 
and limited empirical successes are some of the 
major issues underlying the limitations of current 
neural networks. The learning behavior of such net- 
works is difficult to understand, and the role of gen- 
eric elements and subnetworks is unclear. Further- 
more, most of these networks lack a theoretical 
foundation. The time and effort required to develop 
neural network architectures (network topology) and 
training is very high. Research has been directed in 
the main at "modeling applications", while relatively 
few "fielded applications” have emerged [3]. Most 
of such applications are restricted to pattern recogni- 
tion, categorization, and realizations of associative 
memory. They are still toy research problems at the 
proof-of-concept stage. Among the few exceptions, 
the Adaptive Channel Equalizer (developed by Ber- 
nard Widrow) is perhaps the most commercially 
successful of all neural network applications to date. 
It is a single-neuron device used now in virtually all 
long-distance telephone systems to stabilize voice 
signals [3]. 

Klopf [13] has postulated that, "An intelligent 
system will have to build on a foundation that 
amounts to a highly detailed, immense microscopic 
knowledge base, a knowledge base that can be inter- 
faced effectively with higher functional levels." 
From this perspective, a neural substrate could 
develop into the microscopic knowledge base. The 
macroscopic capabilities of intelligence could then 
be built on top of this. Given the limitations of 
current neural networks, a plausible scheme is to 
incorporate capabilities previously found on the 
macroscopic, network level into the microscopic, 
neuronal (single-neuron) level. 

In this connection, we introduce cognitive 
single-neuron models that coincide with existing 
animal learning theory. Each proposed model pro- 
vides a basis for understanding and explaining 
Pavlovian conditioning [25] [20] and instrumental 
conditioning [20], respectively, which are the best 
understood animal learning processes. In particular, 
one model, an associative critic neuron, captures the 
predictive nature of Pavlovian conditioning, which 
is essentia] to the theory of adaptive / learning sys- 
tems. Another model, an associative learning neu- 
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ron, possesses the associative nature of instrumental 
conditioning, which stores in memory the temporal 
relationship between input and output. 

C. Outline 

The problem of learning via credit assignment 
[4] is described in Section II. The statement of the 
pole-balancing problem follows. This problem may 
be viewed as a canonical example of dynamic con- 
trol. Some concepts from earlier related work are 
given in Section III. They serve as a basis for com- 
parison of previous and proposed approaches. The 
proposed intelligent system is presented in Section 
IV. Here, a fuzzy logic-based controller is intro- 
duced, and a learning system with cognitive neural 
models is proposed. Computer simulation results are 
described in Section V. The paper closes with a 
concluding remark in Section VI. 

II. A CASE STUDY: 

THE POLE BALANCING PROBLEM 

In machine learning, the problem of learning 
to control physical dynamical systems has been, and 
remains, a challenging goal. In this context, the 
credit-assignment problem is often encountered in 
adaptive problem-solving systems, and is especially 
acute when evaluative feedback is delayed or infre- 
quent. Basically, the credit-assignment problem, is 
to determine a strategy for assigning positive credit 
(reward) to desirable actions and negative credit 
(punishment) to undesirable actions in a way that 
would lead to the achievement of a specific goal. In 
what follows, we describe an approach to the build- 
ing of an intelligent rule-based system that can learn 
to control a dynamical system without prior 
knowledge of its input-output relations. 

Our approach focuses on a paradigmatic con- 
trol problem - the pole-balancing problem - which 
has been the object of several studies in the litera- 
tures of control and neural networks. The pole 
balancing system is described as follows. A rigid 
pole is hinged to a cart, which is free to move on a 
one-dimensional track. The pole can rotate in the 
vertical plane of the track and the controller can 
apply an impulsive force of bounded magnitude to 
the cart at discrete time intervals. By balancing the 
pole, we mean that the pole never deviates by more 
than, say, 12 degrees, from the vertical. The equa- 
tions of motion of the cart-pole system are not 
known to the controller, which implies that the 
cart-pole system is treated as a black box. What is 


known is a vector describing the cart-pole system’s 
state at every time step. If the pole falls, it receives 
a failure signal. After a failure signal has been 
received, the system is reset to its initial state and a 
new attempt is made. On the basis of this evaluative 
feedback, the controller must develop its own con- 
trol strategy and learn to balance the pole for as 
long as possible. Since a failure signal usually 
occurs only after a long sequence of individual con- 
trol decisions, the sparsity of this signal makes the 
credit-assignment problem nontrivial. 

m. PREVIOUS RELATED WORK 

There are two noteworthy previous studies 
which have addressed the pole-balancing problem. 
The first is that of Michie and Chambers [21] in 
1968. They constructed a program called BOXES 
that learned to balance the pole by applying two 
opposite constant forces. The second study is that 
of Barto, Sutton, and Anderson [1] in 1983, which 
used neuronlike adaptive elements to solve the same 
problem by using two constant forces. In general, 
both approaches can handle the credit-assignment 
problem that we mentioned. In both, the state space 
is partitioned into several non-overlapping regions 
and no symbolic reasoning techniques are employed. 
Both are limited to only two control actions: push- 
ing the cart left or right with a force of fixed magni- 
tude. The problem is thus one of bang-bang control. 

In contrast to these approaches, we attempt to 
solve the problem through the use of symbolic 
problem-solving techniques, employing a fuzzy 
rule-based controller with approximate reasoning. 
Furthermore, a continuous control scheme is 
employed, namely, the controller can apply a force 
with a magnitude within [-10, +10] newtons. In this 
way, better performance of the controlled system 
may be achieved but the complexity of the problem 
is increased substantially. An overlapping partition 
of the state space forms a linguistic space. The 
overlapping partition enhances the speed of learning 
and robustness. We will have more to say about 
these issues at a later point 

IV. THE INTELLIGENT CONTROL SYSTEM 

Experience shows that a fuzzy logic-based 
system using approximate reasoning [16] [17] make 
it possible to emulate and even surpass the 
decision-making ability of a skilled human operator. 
And, neural network theory [3] provide a new com- 
putational structure, a plausible approach for infor- 


199 


mation processing because of its adaptivity / learn- 
ing as well as massive parallelism. In this connec- 
tion, We developed an intelligent control scheme by 
integrating human decision-making and animal 
learning behavior employing fuzzy logic and neural 
network theory. 



Fig. 1. Schematic representation of the intelligent 

system. 

As shown in Figure 1, one key component of the 
intelligent system is a fuzzy logic-based controller 
which emulates human decision-making behavior. 
Another key component is a neural net The net is 
composed by two cognitive neural models, an asso- 
ciative critic neuron (ACN) and an associative learn- 
ing neuron (ALN), derived from animal learning 
theory, which stimulate memory association and 
learning behavior. 

As a key component of the intelligent con- 
troller, the fuzzy logic-based system provides a 
linguistic description of control strategy. It is com- 
posed by a rule base, a fuzzy decoder, decision- 
making logic, and a defuzzifier. In general, the rule 
base describes control strategy which has the form 
of a collection of fuzzy control rules. For example, 
if the angle of the pole is positive large and the 
angular velocity is positive large, then the applied 
force is positive large . These are implemented and 
manipulated using fuzzy set theory [34] and are to 
be leamt by the proposed neural net. The fuzzy 
decoder inspects the incoming system state and fires 
the rules in parallel. A set of firing strength (*,■) is 
then generated and serves as input for the decision- 
making logic and neural net The decision-making 
logic, the inference engine of the system, emulates 


human decision-making behavior based on the prin- 
ciples of approximate reasoning [35]. The 
defuzzifier takes a fuzzy control decision from the 
decision-making logic and determines a non-fuzzy 
control action (F). 

The learning capability of the intelligent sys- 
tems is provided by the associative critic neuron 
(ACN) and associative learning neuron (ALN). 
More specifically, the ACN is derived by using 
Pavlovian conditioning theory [25] [20]. It captures 
the predictive nature of Pavlovian conditioning and 
has to do with criticism (?) from the environment 
(r) associated with the system state (x,). The ACN 
derives from the instrumental conditioning theory 
[20]. It is an associative memory system, which 
remembers the temporal relationships between input 
( Xi ) and output (F), and associates each fuzzy con- 
trol rule with an appropriate fuzzy control action 
(Fil 

A. Fuzzy Logic Control 

In recent years, rule-based controllers employ- 
ing approximate reasoning have emerged as one of 
the most active areas of research in the applications 
of fuzzy set theory. Such reasoning [35] plays an 
essential role in the remarkable human ability to 
make rational decisions in an environment of uncer- 
tainty and imprecision. In essence, approximate rea- 
soning is the process or processes by which a possi- 
bly imprecise conclusion is deduced from a collec- 
tion of imprecise premises. By employing the tech- 
niques of fuzzy set theory [34], approximate reason- 
ing (with precise reasoning viewed as a limiting 
case) can be studied in a formal way. 

The concept of a fuzzy set may be viewed as 
an extension of an ordinary (crisp) set. In a fuzzy 
set, an element can be a member of the set with a 
degree of membership varying between 0 and 1. 
Thus, a fuzzy set F in a universe U = {«*, 
i=l, n) is defined by its membership function 
li F : U -> [0,1]. If the jj./r (iz, ) are 0 or 1, the fuzzy 
set is an ordinary set. As a special case, a fuzzy sin- 
gleton is a fuzzy set containing just one element 
with degree 1, 

A concept which plays an important role in 
the applications of the theory of fuzzy sets is that of 
a linguistic variables . To illustrate, if speed is inter- 
preted as a linguistic variable, that is, a variable 
whose values are linguistic labels of fuzzy sets, then 
the values of speed might be 

T {speed) = [slow , moderate , fast , 
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For instance: 



Fig. 2. Diagrammatic representation of various 
linguistic values of speed. 

very slow , more or less fast , • • ■ }, 

In a particular context, slow may be interpreted as, 
say, "a speed below about 40 mph", moderate as "a 
speed close to 55 mph" and fast as "a speed above 
about 70 mph". Figure 2 shows this interpretation in 
the framework of fuzzy sets. 

The set-theoretic operations on fuzzy sets are 
defined via their membership functions. More 
specifically, let A and B be two fuzzy sets in U 
with membership functions ji a and [i B , respectively. 
The membership function of the union A\jB 
is defined pointwise for all ueU by 

Haub 00 - max {M«).M«)). 

Dually, the membership function of the inter- 
section A nB is defined pointwise for all ueU by 

4^(u) = min {MkXMw)}. 

If A A n are fuzzy sets in U\ , . . . , t/„, 

respectively, the Cartesian product of 
A i , , . . , A n is a fuzzy set in the product space 
U\X ■ * • xU n with the membership function 

’ ’ * »“«) = 

min Ul Al («i), • * • JIa/Kh)}- 

Assume that the fuzzy sets A, A' ,2? , and B ' are 
the linguistic values of x and y. An example of 
approximate reasoning involving x and y is the fol- 
lowing: 

premise 1 : x is A \ 

premise 2 ; ijjt is A J.h. en y is B , 

consequent : y is B \ 


premise 1 : the speed of a car is very high , 
premise 2 : if the speed of a car is high 
then the probability of an accident is high , 
consequent : the probability of an accident 
is very high . 

This type of fuzzy inference is based on the compo- 
sitional rule of inference for approximate reasoning 
suggested by Zadeh [35]. 

A rule-based controller consists of a set of 
fuzzy control rules which are processed through the 
use of approximate reasoning. For simplicity, sup- 
pose that we have the two rules: 

R i : if x is A i and y is then z is C\ % 

or 

R 2 \ if x is A 2 and y is B 2 then z is C 2 . 

Approximate reasoning, given ( x is A ') and (y is 
£'), computes the degree of partial match between 
the user-supplied facts and the knowledge rule base 
as follows. 

The degrees of match of (A, and A ) and (£, and B ) 
are given respectively by 

a,- = max min{p A (u), Mu)}, 

u 1 

p, = max min{p«. (v), (v)]. 

V 1 

The firing strength of the i th rule is given by 
Xi = min {a,, p,}. 

Hence, the i 0 * rule recommends a control decision 
as follows: 

\i c <w) = minUi, HcXwO). 

The consequences of multiple rules can be com- 
bined by a conflict-resolution process which treats 
the sentence connective or as a union operator. The 
combined consequence is then given by 

Mc(w) = max{^ 

The combination of consequences is illustrated in 
Figure 3. 

In on-line processes, the states of a control 
system are essential to a control decision (action). 
The underlying data are usually obtained from sen- 
sors and are crisp. It may be necessary to convert 
these data into the form of fuzzy sets [16]. In prac- 
tice, however, crisp data are frequently treated as 
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Fig, 3. Diagrammatic representation of approximate 
reasoning using fuzzy input. 


Ri : if x is A x and y is B x then z is C x , 
i = 1, 2,..., n , 

where x, y , and z are linguistic variables represent- 
ing the angle of the pole with respect to the vertical 
axis, angular velocity of the pole, and applied force, 
respectively; A x , B x , and C 4 are the linguistic values 
(fuzzy sets) of the linguistic variables x, y, and z in 
their respective universes of discourse, [-12,+12] 
degrees, R, and (-10* +10] newtons. The definitions 
of linguistic values A x and B x are shown in Figure 5 
(a) and (b). The problem is to learn the linguistic 
values C it which take the form of triangles, defined 
on the control force universe [-10, +10] newtons. 
The conception of fuzzification is performed as 
shown in Figure 5 (c). The location of the vertex of 
such a triangle is to be learned, while the coordi- 
nates of the base are functions of the vertex location 
value, say in the extreme case, +/- 2 newtons away 
from that vertex. 


fuzzy singletons. In this case, the corresponding 
inference mechanism is shown in Figure 4. 
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Fig. 5, (a) Linguistic values of angle, (b) angular 
velocity, and (c) applied force. 


Fig. 4. Diagrammatic representation of approximate 

reasoning using crisp input. 

Furthermore, in on-line control, the inference pro- 
cess should lead to a non-fuzzy control action. This 
necessitates the use of a defuzzifier . A defuzzifier 
can be implemented by using max criterion , mean of 
maximum or center of area algorithms [17]. The 
defuzzifier used here is employing the center of 
area algorithm. 

In what follows, the fuzzy control rules are 
assumed to be of the form 


To summarize the ideas thus far discussed, the 
conception of a 2-D linguistic state space is formed. 
The x axis is 8 with seven linguistic values; the y 
axis is 0 with three linguistic values. Thus, 8x4 
fuzzy control rules are involved. Each fuzzy control 
rule corresponds to a fuzzy cell. The premise of a 
fuzzy control rule determines the cell’s coordinates 
in the linguistic state space. The consequent of the 
rule is taken to be the content of the cell, which is 
to be learned by the proposed neurons, the ALN and 
ACN. Once a system input is sensed, the cells are 
fired in parallel. The fuzzy decoder takes the 
current state of the cart-pole system as an input and 


202 



has n outputs (firing strengths) going to the ALN 
and ACN. Each output of the fuzzy decoder 
corresponds to a fuzzy cell. The activity of the out- 
put is the firing strength. The firing strength serves 
as an input to both the ALN and ACN, and is also 
used to compute the recommended control action in 
each rule (cell). 

B. Learning with a Neural Net 

As has been mentioned in Section II, the prin- 
cipal difficulty in the learning process is that the 
training information (failure signal) is very sparse. 
Many of the previously employed neural networks 
such as the Adaline, perceptrons, and Hopfield nets, 
are effective for the solution of supervised pattern 
classification problems. In contrast, our network 
consists of the ACN and ALN which perform unsu- 
pervised learning. The ACN has to do with the cri- 
ticism from the environment associated with the sys- 
tem state. The ALN takes the criticism and associ- 
ates n fuzzy control actions with n fuzzy cells (the 
consequents of n fuzzy control rules). Since the 
ACN predicts the criticism at every time step, the 
ALN can continuously update itself before the 
failure signal occurs. This is the basis for the solu- 
tion of the credit-assignment problem. 

1. ACN 

The ACN is derived from Pavlovian condi- 
tioning theory [25] [20]. The best known example of 
Pavlovian conditioning comes from Pavlov’s 
research on the conditioned reflex of salivation by 
dogs. Prior to conditioning, when a dog hears the 
sound of a bell, it pricks its ears. Then, when the 
food is presented to it, it salivates. If this sequence 
of events is repeated, the dog soon starts to salivate 
in reaction to the sound of the bell. In effect, the 
dog has been "conditioned" to react to the bell. As 
can be seen, the sound of a bell can be used to 
predict the occurrence of salivation before the pres- 
ence of food. This predictive relationship between 
food and the sound of a bell has important implica- 
tions. Thus, the ACN captures this predictive nature 
of the Pavlovian conditioning. 

The correspondence between Pavlovian condi- 
tioning and the behavior of our system is as follows. 
Food corresponds to the evaluative feedback (failure 
signal). The salivation by reflex is equivalent to an 
external reinforcement r(t) with the value -1.0 if 
failure signal occurs, otherwise 0.0. The sound of a 
bell relates to the i A fired fuzzy cell (fuzzy control 


rule) with firing strength x , . The salivation resulting 
from the bell’s sound is the predictive reinforcement 
Vj (/) of the t A fuzzy cell. It is worth noting that, in 
the extreme, the i* rule with firing strength either 
1.0 or 0.0 is the exact case of presence or absence 
of a bell’s sound in the conditioning of a Pavlov 
dog. In other words, our ACN operates in a continu- 
ous mode, which treats Pavlovian conditioning as a 
special case. In effect, the ACN attempts to predict 
the reinforcement v, (t ) that can eventually be 
obtained from the environment by choosing a con- 
trol action for that fuzzy cell. 

As an extension of single-input/single-output 
analogy, multiple inputs in the ACN necessitate an 
output which is a weighted sum of the predictive 
reinforcements of all fired fuzzy cells. The 
weighted sum p(t) is the total reinforcement of all 
fired fuzzy cells at time t. Furthermore, an internal 

reinforcement r(t), the criticism, is generated as a 
temporal difference of the total predictive reinforce- 
ments. 

As shown in Figure 1, the ACN has an exter- 
nal reinforcement input, r(/), from the cart-pole sys- 
tem, n inputs, Xj(t), *=1, .... n, from corresponding 

fuzzy cells, and an output, r(f), as internal rein- 
forcement signal (criticism) for the ALN and itself. 
The total reinforcement at time t is given by 

p(t) = G(J)v i (r)x i (r)), 

4=1 

where G could be a sigmoid-shaped function, iden- 
tity function, mean of maximum algorithm or center 
of area algorithm. The associative learning rule for 
the i rt fuzzy cell is in part characterized by a local 
memory trace x-(f) and the internal reinforcement 

r(/). The predictive reinforcement V;(r) of the 
fuzzy cell (fuzzy control rule, fuzzy system state) is 
updated by 

V,(t+1) = V,(f) + (5r(0*i(0. 

where P is a positive learning-rate parameter. The 
local memory trace is defined by 

Xi (t +1) = Xxi (t ) + (1-X) I x, ( t )v ; (f ) I , 

where X, 0 < X <1, is a trace-delay parameter. The 
trace takes the form of an exponential curve. It is 
strengthened by the degree of firing strength of the 
fuzzy cell (fuzzy control rule) together with its 
current weight, and weakened if the rule is not fired. 
The trace thus keeps track of how long ago the i A 
fuzzy control rule fired and also how often it was 
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fired. The internal reinforcement is calculated as 

r(0 = K0 + P(0~P ('-!)* 

where y, 0 <y <1, is a discount-rate parameter. The 
internal reinforcement serves as criticism, depending 
on a relative difference of p(t) and pit- 1). If the 
pole does not fall and yp(t)>p(t-l ), then r(/)=0 

and r(t )>0, a reward is given. If the pole does not 

fall and yp(t)<p(t- 1), then r(r)= 0 and r(/)< 0, and 
a punishment is effected. The discount factor y 
implies a bias for the condition in which p (t) equals 
p{t- 1). More specifically, once the pole does not 
fall and keeps in the same state, a reward is given 
through the use of a discount factor. On the other 
hand, if the pole falls, then p(t)= 0, r(t )=- 1 and 

r(f)<0, and a punishment is issued. If pit- 1) fully 
predicts the occurrence of the failure, there is no 
punishment. As shown, a negative feedback 
mechanism is implicitly incorporated into the inter- 
nal reinforcement. 

The proposed ACN model might be viewed as 
an extension of the Sutton-Barto model [18]. More 
specifically, in the context of animal learning 
phenomena, a sigmoid-shaped acquisition curve is 
observed. This is not simulated in the Sutton-Barto 
model. In our model, it can be achieved by making 
a change in the associative strength proportional to 
the current associative strength [18]. It has been 
demonstrated by computer simulation that the ACN 
accounts for many phenomena observed in Pavlo- 
vian conditioning, such as a sigmoid-shaped acquisi- 
tion curve, inter-stimulus interval effects, trace con- 
ditioning, and delay conditioning. A more detailed 
discussion of this aspect of our model is described 
elsewhere [18]. 

2, ALN 


lowing: the I th fuzzy control rule can produce 
correct control force of the I th rule under the inter- 
nal reinforcement from the ACN. In effect, the 
ALN is a content-addressable memory system which 
associates each fuzzy control rule with an appropri- 
ate fuzzy control action. 

As shown in Figure 1, the ALN has an inter- 
nal reinforcement input, r(r), from the ACN, n 
inputs, jq(f), i= 1, ..., n , from the fuzzy decoder, a 
control action input. Fit), from the defuzzifer, and 
n associative weights i=l, n , as outputs for 
the rule base. Each associative weight w,(0 is 
transformed - by using the concepts of dynamical 
normalization and fuzzification - into a fuzzy set 
having the form of a triangle as described in the 
previous section. Symbolically, 

Fi(t) - fuzzifier (fft)). 

where /,-(#) is the location of the vertex of the trian- 
gle. It is given by 

flit) = HiWiit) + noise(t)), i = 1, ..., n , 

where H is a dynamic sigmoid function which may 
be viewed as a dynamic normalization function and 
provides a continuous output within the range [- 
10, +10]. For the purpose of computer simulation, 
the following function is used: 


lOx 

x>0. 

T(t)+x 


E 

x=0. 

10* 


T(thx 

x<0. 


where Tit) = k x max lw;(f)l is an offset-tuning 

i 

parameter which determines the slope of the 
sigmoid-shaped curve; and k\ is a constant. The 
associative learning rule for each w, (t ) is 


The ALN is derived from the instrumental 
conditioning theory [20]. A simple example is 
teaching a dog to perform a trick. During training, if 
the dog does well, it is given a reward. If not, it is 
punished. After training, the dog has learned a new 
trick. The association of the dog’s response and 
reinforcement has in effect been "conditioned". The 
correspondence between this conditioning and the 
ALN is as follows. A dog corresponds to the i* 
fuzzy control rule with firing strength x ,. The 
response of the dog relates to the control force 
On/,) of the i‘ h rule. The reinforcement as 
reward/punishment is equivalent to the internal rein- 
forcement from the ACN. The ALN does the fol- 


w,0+l) = w/f ) + <5.(0* (t )ei(t). 


where G is a dynamical positive learning-rate 
parameter with a initial value a and k 2 is a weight- 
freeze parameter. The weight-freeze parameter 
determines the decreasing rate of the dynamical 
learning rate G. r(t) is the criticism from the ACN. 
The associativity trace, e,(/), is given by 

e, 0 +1) = 6e ; 0 ) + (1-8)F (t )x, (t ), 

where 5, 0 £ 5 <1, is another trace-decay parameter. 
The associativity trace takes the form of an 
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exponential and it remembers for how long and how 
often a fuzzy control rule has fired as well as what 
control action was taken at that time. 



Fig. 6. Signal flow of the intelligent control system. 

Figure 6 illustrates the signal flow of our pro- 
posed controller during a learning process. In prin- 
ciple, once a system state is sensed, the set of fuzzy 
control rules is fired in parallel. A set of firing 
strengths (x t ) is then generated and serves as input 
to both ALN and ACN. The information about the 
system state is then fed into the two neuronlike ele- 
ments by the set of firing strengths. The firing 
strength together with the predictive reinforcement 
(v;, or desirability) of the fuzzy rule generates 
the local memory trace (Xi , desirability trace) of the 
i th fuzzy rule. The total reinforcement, p> or 
equivalently, the desirability of all fired fuzzy cells, 
is computed based on the firing strength and the 
reinforcement (desirability) of each fuzzy rule. A 
non-fuzzy control action, F , is determined after the 
processes of inference combining and 
defuzzification. The control action, F, together with 
the firing strength, x it of each rule contributes the 
associativity trace, e,, of each rule. After applying 
the control action to the plant, a goal evaluation, r, 
is made, which takes binary values. Based on the 

yes-no evaluation, the criticism, ?, which is a more 
informative evaluation, is generated. It plays an 
important role in the solution of the credit- 
assignment problem. The weights (v f , w,) in learn- 
ing rules are thus updated on the basis of the criti- 
cism and their own local memory trace, (3 q, e,). A 


fuzzy control force in each rule is generated from 
the w, by the use of dynamic normalization and 
fuzzification. 

V. SIMULATION RESULTS 

We implemented our system on a Sun works- 
tation. For comparison purposes, we also imple- 
mented Barto’s system [1] for solving the same 
problem. The mass of the cart and initial pole were 
1.0 kg and 0.1 kg, respectively. The length of the 
pole was 1.0 meter. The parameter values used in 
our simulation were: a=1000, (3=0.5, y=0.95, 5=0.9, 
k= 0.8, e=0.1, jfc^O.15, and * 2 =2500. A run was 
called "success" whenever the number of steps 
before failure was greater than 60,000. The external 
reinforcement r(t) was -1 when the failure signal 
occurred, otherwise, it was 0. Every trial began 
with the same initial cart-pole states, 0=0, 0=0, *=0, 
i=0, and ended with a failure signal when ! 9 1 >12 
degrees. All memory traces, x t and e k , were set to 
zero. All the weights, w, ; were set to zero, and a 
lower bound v, (=-0.0001) was set to all the 
weights. In testing the performance of the system, 
the simulator was run by applying the Adams 
predictor-corrector method with a time step of 20 
ms [19]. 



Fig. 7. Learned control surface based on the pro- 
posed intelligent system with COA defuzzifier. 

A. Learning / Training 

The proposed controller and Barto’s system 
are capable of learning to balance the pole. How- 
ever, experiments show that our system ha^ a better 
learning performance [19]. The proposed controller 
learns to balance the pole by 6 trials with COA 
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defuzzifier. Figure 7 illustrates the learned control 
surfaces based on our intelligent system employing 
defuzzifier center of area(C OA). The performance 
of Barto system, in average, took 27 trials to bal- 
ance the pole [19]. 

Additional observations were made on the 
state trajectory of the angle of the pole with respect 
to the vertical axis. We observed the data after the 
systems learned their own control strategy. The data 
showed that, in every case, our controller could 
keep the angle within a smaller region compared 
with Barto’s. Figure 8 illustrates one set of these 
data from our system and Barto’s, respectively. 



(a) 


B kncei iL 



Fig. 8. (a) State performance of the pole angle based 
on the proposed controller, (b) State perfor- 
mance of the pole angle based on Barto’s sys- 
tem. 


B. Adaptation 


Adaptation is intended to adjust to unforeseen 
changes in environmental conditions using prior 
knowledge. Training involves constructing a 
knowledge base of an application domain (e.g. a 
pole-balancing task) with little a priori domain 
knowledge. The capability of learning to solve new 
tasks by modifying previous learned knowledge 
(adaptation) is compared with that of starting from 
scratch (training). Extensive simulation studies of 
such schemes have been carried out. They show that 
the proposed controller tolerates a wide range of 
uncertainty as well as a lack of system information, 
e.g., parameter changes in the length and mass of 
the pole, changes of failure criteria, and a slanted 
cart-pole system. 

The adaptation experiments were based on 
pre-leamed knowledge by employing the same 
parameter settings as that in the last section. The 
length and mass of the pole were 0.1kg and 1.0m, 
the angle constraint for failure evaluation was 
-/+12°, and the initial value of the angle of the pole 
with respect to the vertical axis is 0.0°. The system 
took 6 trials to learn the task. 

In the first set of experiments, the system is 
required to adapt to changes in the length and mass 
of the pole. Six experiments were performed. The 
first two were to increase the original mass of the 
pole by a factor of 10 and 20, respectively. The 
third and fourth ones were to change the original 
length of the pole by a factor of 2 and 1/2, respec- 
tively. The last two were to replace the original 
pole by two shorter poles. The length and mass of 
the first pole were reduced to 2/3 of the original 
values, while the second one is 1/4. Without pre- 
training, the system took 10, 15, 5, 11, 8 and 6 trials 
to learn these tasks. However, with the pre-trained 
knowledge, the system successfully completed these 
tasks without any further trials. The result shows 
the robustness of the proposed intelligent system. 

In the second set, we added a more severe 
constraint on the angle of the pole for failure 
evaluation. The angle constraints were changed from 
+/-12 0 to +/-6 0 , +/-3 0 , and to +/-1°, respectively. 
The system needed 4 and 6 trials to learn the first 
two tasks with no initial knowledge, but it failed in 
the last task since a finer partition of input space is 
required. While with pre-training, the system 
adapted to all tasks without further trials. 

In the third set, the system was required to 
adapt to the changes in the length and mass of the 


206 



pole (by a factor of 1/2) and angle constraint 
The training took 6 trials, while adaptation 
can handle the new task well. 

Finally, the cart-pole system was lifted at the 
right end in such a way that the base of the system 
and the surface of the table formed an angle of 12°. 
The system took 10 trials to balance the pole. How- 
ever, the system with the trained knowledge needed 
no further trials to complete the new task. 

VI. CONCLUDING REMARK 

In this article, we have proposed a symbolic 
problem-solving approach to a class of learning con- 
trol problems. More specifically, we have attempted 
to develop an intelligent control scheme by integrat- 
ing human decision-making with a fuzzy logic-based 
system and animal learning behavior with cognitive 
neural models. The proposed intelligent control sys- 
tem learns and improves its rule base for better con- 
trol strategy from experience and adapts to changes 
in an environment of uncertainty and imprecision. In 
this way, we avoid an ad-hoc rule -tuning process 
which is usually inefficient and lacking in con- 
sistency. It has been shown that the proposed intelli- 
gent system has a better performance of learning 
speed and system behavior in relation to previous 
approaches. Furthermore, the system is quite robust 
The controller is relatively insensitive to variations 
in the parameters of the system environment, e.g., in 
the context of pole-balancing, changes in the length 
and mass of the pole, failure criteria, and slanting 
the base of the cart-pole system. In addition, the 
controller can be primed with pre-trained control 
knowledge which minimizes rapid changes during 
adaptation. 

The approach described in this paper may be 
viewed as a step in the development of a better 
understanding of how to combine a fuzzy logic 
based system with a neural network to achieve a 
significant learning capability. We plan to address 
various aspects of this important issue in subsequent 
papers. 
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